o 

(N 

a 

P? 
U 

o 



> 



Cryptolysis v. 0.0.1 - A Framework for Automated Cryptanalysis of 

Classical Ciphers 

CIISE Security Investigation Initiative 

Represented by: 



Serguei A. Mokhov 
Marc-Andre Laverdiere 



> 

^ Nader Hatami 

O Ali Benssam 



{mokhov, ma_laver,nade_hat , al_ben}@ciise . concordia. ca 



Montreal, Quebec, Canada 



Fri 7 Jan 2011 15:21:19 EST 



Contents 



1 Introduction [3] 

1.1 What is Cryptolysis? E] 

1.2 Tools El 

1.2.1 Java El 

1.2.2 MARF El 

2 Design and Architecture [4] 

3 Methodology M 

3.1 Cryptanalysis Heuristics El 

3.2 Word Boundary Detection |U] 

3.2.1 Dictionary Approach |9] 

3.2.2 Character N-gram Language Model [10] 

3.2.3 Statistical Parsing [TT] 

4 Applications [12] 

5 Results [13] 

6 Conclusions 1141 

6.1 Acknowledgments O 

Bibliography 1151 

A Cryptolysis Application Source Code 1161 



List of Figures 

2.1 Cyrptolysis Packages [5] 

2.2 Ciphers Framework 

2.3 Crypto Analyzers Framework [7] 

2.4 Use of Modules By the Cryptolysis Application |8] 



1 



List of Tables 



2 



Chapter 1 



Introduction 

Revision : 1.2 

1.1 What is Cryptolysis? 

Cryptolysis is a framework that includes a collection of automated attacks on the classical ciphers based 
on the article |CD98j . 

1.2 Tools 

1.2.1 Java 

We have chosen to implement our project using the Java programming language. This choice is justified 
by the binary portability of the Java applications as well as facilitating memory management tasks and 
other issues, so we can concentrate more on the algorithms instead. Java also provides us with built-in 
types and data-structures to manage collections (build, sort, store/retrieve) efficiently [Fla97j. 

1.2.2 MARF 

Portions of Cryptolysis reply on MARF, described in many published works, we just cite one instance here 
for the follow up information [ThelO]. 
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Chapter 2 

Design and Architecture 



Revision : 1.2 

Before we begin, you should understand the basic system architecture. Understanding how the parts 
interact will make the follow up sections somewhat clearer. This document presents architecture of the 
Cryptolysis system, including the layout of the physical directory structure, and Java packages. 



Figure 2.1 lists the Java packages, Figure [272] presents the Ciphers Framework; its purpose is to have 
some quick in-house tools for creating testing ciphertexts. The most common API in here is a series of 
encrypt () and decrypt () calls. Then, Figure [273] presents the core of this work, namely modules to 



perform the described attacks on ciphertext. They all implement the analyze () method. In Figure 2.4 



is the way the main application uses (i.e. instantiates) the concrete modules based on the set of options 
supplied. 



TODO 
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ci 


nhers 



o 



ICipher 

(from ciphers) 



*encrypt(pstrPlainText : String, pstrCipherText : String) : void 
^encrypt(pacPlainText : char[], pacCipherText : charfD : void 

*encrypt(poPlainText : InputStream, poCipherText : OutputStrearn) : void 
^decrypt(pstrCipherText : String, pstrPlainText : String) : void 
^decrypt(pacCipherText : charf], pacPlainText : charfD : ™id 

*decrypt(poCipherText : InputStream, poPlainText : OutputStrearn) : void 



Cipher 

ffrnm ciphers) 



^encryptQ 
^encryptQ 
^decryptQ 
^decryptQ 



Shift 

(from ciphers) 



^#iShiftConstant : int = 3 
^iNurnLanguageSymbols : int = 26 



^encryptQ 

^decryptQ 

♦getShiftConstantO 

♦setShiftConstantQ 

^encryptQ 

^decryptQ 



SPN 

(from ciphers) 



Substitution 




(from ciphers) 




^oHashMapEncryptionMapping 


HashMap 


^oHashMapDecryptionMapping 


HashMap 


^decryptQ 




^perrnuteO 




^SubstitutionQ 




^setPermutationQ 




^getPermutationQ 




^encryptQ 




^encryptQ 





Figure 2.2: Ciphers Framework 
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Statistical 

(from analysis) 



^analyz 



O 



ICryptoAnalyzer 

(from analysis) 



^analyze(pstrCipherText : String, pstrPlainText : String) : void 
^analyze(pacCipherText : char[], pacPlainText : char[j) : void 
s> anslyze(poCipherText : Input Stream, poPlainText : Out put St ream) : void 
et St at i st i c s (po St at s : LanguageStatistics) : void 
^calculateKeyCost(poKey : Key, pacCipherText : char[], pacPlainText : char[J) void 



Gfy0:oAnahfzer 

f torn anal ysts) 



^analyzeQ 
^analyze!) 



SirnulatedAnnealing 

(from a n a rysi s) 



^MAXJTER : long = 10 
gfc TEMPFACTOR : double = 0.95 
g^ NUM_CHARACTERS_IN_ALPHABET : int = 26 
& NTIME5100 : int = 100 * NUM CHARACTERS INALPHABET 



& N_TIME5_10 : int = 10 * NUMCHARACTERSJNALPHABET 
%B LTZM AN N_CO N STANT : double = 1.3Se-23 
%iNumberOfSuccesses : int = 
^iNumOfCheckingKeys : int = 
^fb NEWLINE : int = 10 
^SPACE : int = 32 
^ OFFSET: int = 36 
^adSecondOrder[][] : double 
^dTemperature : double 



^analyzeQ 

^SimulatedAnnealingQ 
^populateSecondOrderArrayQ 



Genet icAlgorit hm 

(from analysis) 



^analyzeQ 



Key 

-^(from analysis)^ 



StatisticsBuilder 

If ro m an a lysis) 



^createStatisticsQ 
^addToStatisticsQ 
*addToStatisticsQ 



Tabu Search 

(from analysis) 



^► MAX ITER : int = 30 
% TABU LIST SIZE : int = 
4 pQSS_LIST_SIZE : int = 



^analyzeQ 

^TabuSearchQ 

^analyzeQ 

♦setStatisticsQ 

♦findBestKeylnTabuListQ 

♦findBestKeylnPossListQ 

♦findWorstKeylnTabuListQ 

^createNewKeyQ 

^initialiseTabuListQ 

^isKeylnTabuListQ 

^isKeylnPossListQ 



LanguageStatistics 

(from anarysis) 



^INumMo no grams : long 
^INumBi grams : long 
l^hashMapMonogramCount : HashMap 
^has h Map Bi gram Count : HashMap 
^oStringBuffer : StringBufFer 



^LanguageStatisticsQ 
^clearQ 

%rackMonogramQ 

%rackBigramQ 

^getMonogra in Statistics!) 

^getBigramStatisticsQ 

^getBigramStatisticsQ 

^cornputeSuitabilityQ 

*toStringQ 

^backSynchronizeObjectQ 
^getHashMapBigramCountO 
^getHashMapMonogramCountQ 
^getNumBigramsO 
'"■-•'get N u m Mon □ g ram sQ 
^createStringForBigramQ 



MatchednessChecker 

(from anarysis) 



i^lNumBytes : long 
i^lNurnMisrnatches : long 
^hashMapMonogramCount : HashMap 
^□StiingBuffer : StringBuffer 



^MatchednessCheckerQ 
^computeMatchednessQ 
^trackMonogramMatchednessQ 
^getMonogra mMatchednessStatisticsQ 
^getTra ckedMonogramsQ 



Figure 2.3: Crypto Analyzers Framework 
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Statist ical 

(from analysis) 



SirnulatedAnnealing 

(from analysis) 



GeneticAlgorithrn 

(from analysis) 



TabuSearch 

(from analysis) 



Cryptolysis 

(from apps) 



<> OPT HELP SHORT: int = 1 
O OPT HELP LONG : int = 2 
O OPT VERSION : int = 3 
o OPT CRYPTOANALYZE : int = 4 
<? OPT PARSE : int = 11 



O OPT TRAIN : int = 16 
O OPT ENCRYPT: int = 5 
O OPT DECRYPT: int = 6 
O OPT SHIFT CIPHER: int = 7 
O OPT SUBSTITUTION CIPHER : int = 8 
O OPT SPN CIPHER : int = 9 
O OPT FILENAME : int = 10 

._- o OPT STATS : int = 12 

<> OPT GENETIC : int = 13 
O OPT ANNEAL: int = 14 
O OPT TABU : int = 15 
O OPT KEY : int = 16 
O OPT LANG STATS : int = 17 
O OPT PICT PARSER: int = 18 
O OPT NGRAM PARSER : int = 19 
O OPT DEBUG : int = 20 
O OPT TARGET FILE : int = 21 
O OPT CHECK MATCHEDNESS : int = 22 



VisageQ 
^rnainQ 



-~7 



Shift 

(from ciphers) 



Substitution 

(from ciphers) 



SPN 

(from ciphers) 



CryptolysisException 

(from util) 



^CryptolysisExceptionQ 
^CryptolysisExceptionQ 
^CryptolysisExceptionQ 
^CryptolysisExceptionQ 



± 



DictionaryB asedParser 

(f ro m utif) 



^Dictionary BasedParserQ 
^Dictionary BasedParserQ 
^Dictionary BasedParserQ 
^parseQ 
*trainQ 

^setupTokenizerQ 
^pushBackCharactersQ 



CharacterNgrarnBasedParser 

(from util) 



^CharacterNgrarnBasedParserQ 

^CharacterNgrarnBasedParserQ 

^CharacterNgrarnBasedParserQ 

^parseQ 

♦trainQ 

^setupTokeniierQ 



Figure 2.4: Use of Modules By the Cryptolysis Application 
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Methodology 



Revision : 1.4 

3.1 Cryptanalysis Heuristics 

Most of this work is based on [CD98] . 

3.2 Word Boundary Detection 

Since the deciphered text comes in as a stream of characters with no spaces or punctuation as a bonus it 
is a good idea to implement automatic placement of spaces between words in the text for ease of reading. 
This problem in speech recognition and natural language processing (NLP) is often referred to as word 
boundary detection. The two main methods approaching this problem of converting a stream of characters 
to a stream of words are a dictionary approach and a statistical approach of n-gram language models. 
The models can be further refined with sentence boundary detection via statistical NLP parsing. The 
below are the details of the methodology for implementation of these two approaches in Cryptolysis, their 
current paramters, advantages, limitations, and the way these limiations can be overcome. 

All methods have some training to do prior use on the source language corpora^] The trained data 
has to be serialized somehow and be usable at later runs of the application. 

3.2.1 Dictionary Approach 

The dictionary approach is the most common one. One has to compile a dictionary of words of English 
from some training corpora. Next, when doing the word boundary detection, start with the first character 
of the stream and try to match a longest possible word that begins with this character and subsequent 
characters match; if so insert a space. 

1 A corpora is a collection of natural language documents used for any sort of natural language processing technique. 

9 
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The Requirements 

• If not importing the dictionary from some external sources, the training corpora must be large to 
include more words. 

• Recovery technique should be in place if a matching word is not found in the dictionary, e.g. by 
proceeding further until a word is found and keep the previous sequence of characters as if it was a 
word. 

Problems and Limitations 

• Word is not found; especially this is pertinent to to proper names of people, places, etc. which all 
cannot be easily found in the training copora in comprehensive manner. Likewise, some domain 
specific terms and names, equations, etc. may cause this problem. 

• Composite words is another issue. By matching the longest possible matching sequence may miss 
a space or cases and won't be able to distinguish whether there was a space or not. For example, 
these two cases are legal: "therefore" and "there for eternity" in example setences "Therefore, the 
theorem holds." and "This stone was lying there for eternity." . Both instances will be picked up 
as "therefore" resuling in a non-dictionary word "ternity" for the second example. Similar problem 
applies to "thereto" , "thereafter 31 , "thereby" , and other composite words. 

• The storage space required to contain the dictionary is large and keeps growing if you add more 
words at training. Therefore, the look up speed reduces. 

3.2.2 Character N-gram Language Model 

Unlike the dictionary approach, the character n-gram model looks at sequences of n characters when 
looking up spaces; thus, it operates on the source character stream directly and produces the word stream 
from probabilistic table look ups. This is a statistical approach. Here we chose a 3-gram model for space 
detection. During training for each word we count how often the following occurs: 

• 2 last characters of the preceeding word and a space 

• last character of the preceeding word, space, and the first character of the second word 

• space and the first two characters of the second word 

The frequencies later on stored in the probabilistic table and the table is serialized. Of course, text 
boundaries are need to be accounted for as there is no preceeding word before first at the beginning and 
the second word after last. 

These 2-character sequences (barring space) are then looked up in the text to parse after the deci- 
phering and spaces are put according to the probabilites found during training. 

The storage space requrires for this method are a lot less than that of the dictionary approach and 
scanning is faster. However, it may put spaces in the undesired places due to irregularities in natural 
language. 
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Problems 

• One-character words 

• Creating non-existent words 

3.2.3 Statistical Parsing 

Statistical NLP parsing [JMOO, Mar03 l lThelO| can help disambiguate with the composite words and even 
find the sentence boundaries. This can be used as refinement tool for either for the methods presented 
above. By trying to parse a longest parseable span of words with a valid parse would give the setence 
boundary. If no parse at all found, it either means the word boundary for some words was not done 
properly or the words are not in the dicitonary /grammar or the source text has not properly English- 
formed sentences. 



TODO 



Chapter 4 

Applications 



Revision : 1.2 

This chapter is to describe the application that employs the Cryptolysis framework. Its source code 
revision is quoted in the Appendix and is subject to maintenance on SourceForge as resources permit. 



TODO 
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Chapter 5 



Results 



Revision : 1.2 

The results are found in the aux folder in the arXiv submissions and have not been integrated yet as 
nice tables and graphs into the document. Please consult the files in that directory. 
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Chapter 6 



Conclusions 



Revision : 1.2 

We have built a simple framework for verification of classical cipher attacks, that is expandable and 
a platform for comparative experiments. We obtained some encouraging results by guessing out the 
keys during the implemented attack scenarios. We plan on expanding to include other algorithms and 
frameworks in the testing environment and improve integration with other frameworks. 

6.1 Acknowledgments 

Dr. Amr M. Youssef and Faculty of Engineering and Computer Science, Concordia University, Montreal, 
Canada. 
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Appendix A 

Cryptolysis Application Source Code 



package cryptolysis . apps ; 

import java. io . FilelnputStream; 
import java. io . InputStream; 
import java.io.FileOutputStream; 
import java. io . OutputStream; 

import marf . MARF ; 

import marf . Storage . IStorageManager ; 

import marf .util .Debug; 

import marf .util . OptionProcessor ; 

import cryptolysis . analysis . Genetic Algorithm; 

import cryptolysis . analysis . ICryptoAnalyzer ; 

import cryptolysis . analysis .LanguageStatistics ; 

import cryptolysis . analysis . MatchednessChecker ; 

import cryptolysis . analysis . SimulatedAnnealing ; 

import cryptolysis . analysis . Statistical ; 

import cryptolysis . analysis . StatisticsBuilder ; 

import cryptolysis . analysis . TabuSearch; 

import cryptolysis . ciphers . ICipher ; 

import cryptolysis . ciphers . SPN; 

import cryptolysis . ciphers . Shift ; 

import cryptolysis . ciphers . Substitution; 

import cryptolysis .util . CharacterNgramBasedParser ; 

import cryptolysis .util . CryptoParser ; 

import cryptolysis .util . Crypt olysisExcept ion; 

import cryptolysis .util .DictionaryBasedParser; 



/** 

* <p>Main Cryptolysis Application . </p> 
* 

* TODO: document 
* 

* $Id: Cryptolysis. java, v 1.19 2005/11/10 10:25:22 mokhov Exp $ 
* 

* Oauthor Serguei Mokhov 

* Oauthor Marc-Andre Laverdiere 
*/ 
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public class Cryptolysis 
{ 

// A bunch of options 
// TODO: document 



public static final int 0PT_HELP_SH0RT = 1; 
public static final int 0PT_HELP_L0NG = 2; 
public static final int 0PT_VERSI0N = 3; 

public static final int OPT_CRYPTOANALYZE = 4; 
public static final int 0PT_PARSE = 11; 
public static final int 0PT_TRAIN = 16; 

public static final int OPT_ENCRYPT = 5; 
public static final int OPT_DECRYPT = 6; 



public static final int OPT_SHIFT_CIPHER = 7; 
public static final int OPT_SUBSTITUTION_CIPHER = 8; 
public static final int OPT_SPN_CIPHER = 9; 
public static final int 0PT_KEY = 16; 

public static final int OPT_FILENAME = 10; 

public static final int 0PT_STATS = 12; 
public static final int 0PT_GENETIC = 13; 
public static final int 0PT_ANNEAL = 14; 
public static final int 0PT_TABU = 15; 

public static final int 0PT_LANG_STATS = 17; 
public static final int OPT_DICT_PARSER = 18; 
public static final int OPT_NGRAM_PARSER = 19; 

public static final int 0PT_DEBUG = 20; 

public static final int OPT_TARGET_FILE = 21; 
public static final int OPT_CHECK_MATCHEDNESS = 22; 

/** 

* (Sparam argv command-line parameters 
*/ 

public static final void main(String [] argv) 
{ 

try 
{ 

//Debug. enableDebug(true) ; 



OptionProcessor oGetOpt = new OptionProcessor () ; 

oGet0pt.addValid0ption(0PT_HELP_SH0RT, "-h") ; 
oGet0pt.addValid0ption(0PT_HELP_L0NG, "--help") ; 
oGet0pt.addValid0ption(0PT_VERSI0N, "—version") ; 
oGetOpt.addValidOption(OPT_DEBUG, "—debug") ; 

oGet0pt.addValid0ption(0PT_CRYPT0ANALYZE, "—analyze") ; 
oGetOpt.addValidOption(OPT_ENCRYPT, " — encrypt") ; 
oGetOpt.addValidOption(OPT_DECRYPT, " — decrypt") ; 
oGetOpt.addValidOption(QPT_PARSE, "—parse") ; 
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oGetOpt.addValidOption(OPT_TRAIN, "—train") ; 

oGetOpt.addValidOption(OPT_SHIFT_CIPHER, "-shift") ; 
oGetOpt.addValidOption(DPT_SUBSTITUTION_CIPHER, "-subst") ; 
oGetOpt.addValidOption(OPT_SPN_CIPHER, "-spn") ; 

oGetOpt.addValidOption(OPT_STATS, "-stats") ; 
oGetOpt.addValidOption(OPT_ANNEAL, "-anneal") ; 
oGetOpt.addValidOption(OPT_GENETIC, "-genetic") ; 
oGetOpt.addValidOption(OPT_TABU, "-tabu") ; 

oGetOpt.addValidOption(OPT_LANG_STATS, "-lang") ; 
oGetOpt.addValidOption(OPT_DICT_PARSER, "-diet") ; 
oGetOpt . addValidOpt i on ( OPT_NGRAM_P ARSER , "-ngram") ; 

oGetOpt . addValidOption(OPT_CHECK_MATCHEDNESS , " — checkmatch") ; 



int iValidOptions = oGetOpt .parse (argv) ; 

if (oGetOpt . isActiveOption(OPT_DEBUG) ) 
{ 

Debug. enableDebugO ; 

} 

if 

( 

oGetOpt . isActiveDption(OPT_HELP_SHORT) 
I I oGetOpt.isActiveOption(OPT_HELP_LONG) 

) 
{ 

usageO ; 
System. exit (0) ; 

} 

if (oGetOpt . isActive0ption(0PT_VERSI0N) ) 
{ 

System. out .println 
( 

"Cryptolysis $Revision: 1.19 $\n" 

+ "Using MARF v." + MARF.getVersionO 

); 

System. exit(0) ; 

} 

suit ch(oGet Opt .get InvalidOpt ions () . size () ) 
{ 

case 0: 

break; 

// Shall be the filename 

case 1: 

{ 

oGetOpt.addActiveOption(OPT_FILENAME, oGetOpt .getlnvalidOptionsO . f irstElement () .toStringO) ; 

oGetOpt .get InvalidOpt ions () . clear () ; 

break; 
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// Shall be the filename and the key- 
case 2: 
{ 

if (oGetOpt . i s Act i veOpt i on ( OPT_CHECK_MATCHEDNESS ) ) 
{ 

oGetOpt.addActiveOption(OPT_FILENAME, oGetOpt .getlnvalidOptions () . f irstElement () .toStringO ) ; 
oGetOpt.addActiveOption(OPT_TARGET_FILE, oGetOpt .getlnvalidOptions () . elementAt (1) .toStringO) ; 
oGetOpt . getlnvalidOptions () . clear () ; 

} 

else 
{ 

oGetOpt.addActiveOption(OPT_FILENAME, oGetOpt .getlnvalidOptions () . f irstElement () .toStringO) ; 
oGetOpt.addActiveOption(OPT_KEY, oGetOpt . getlnvalidOptions () .elementAt(l) .toStringO) ; 
oGetOpt. getlnvalidOptions () .clearO ; 

} 

break; 

} 

case 3: 

oGetOpt.addActiveOption(OPT_FILENAME, oGetOpt .getlnvalidOptionsO . f irstElement () .toStringO) ; 
oGetOpt.addActiveOption(OPT_KEY, oGetOpt .getlnvalidOptions . elementAt (1) .toStringO) ; 
oGetOpt.addActiveOption(OPT_TARGET_FILE, oGetOpt .getlnvalidOptions . elementAt (2) .toStringO) ; 

oGetOpt. getlnvalidOptionsO .clearO ; 

break; 



default : 

throw new CryptolysisException 
( 

"Invalid options found: " + oGetOpt . getlnvalidOptions 

); 

} 

InputStream olnputText = null; 
InputStream olnputTextCompare = null; 
OutputStream oOutputText = null; 

// Get input from either a file name if specified, or STDIN 

if (oGetOpt . isActiveOption(OPT_FILENAME) ) 

{ 

olnputText = new FileInputStream(oGetOpt . getOption(OPT_FILENAME) ) ; 

} 

else 
{ 

olnputText = System. in; 

} 

if ( ! oGetOpt . isActiveOption(OPT_CHECK_MATCHEDNESS) && 
oGetOpt . isActiveOption(OPT_TARGET_FILE) ) 

{ 
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oOutputText = new FileOutputStream(oGetOpt .getOption(OPT_TARGET_FILE) ) ; 

} 

else 
{ 

oOutputText = System. out ; 

} 

if (oGetOpt . isActiveOption(OPT_CHECK_MATCHEDNESS) kk 
oGetOpt . isActiveOption(OPT_TARGET_FILE) ) 

{ 

olnputTextCompare = new FileInputStream(oGetOpt .getOption(OPT_TARGET_FILE) ) ; 

} 

// Assume defaults 

if (iValidOptions == 0) 

{ 

// Run statistical cryptanalysis assuming STDIN 

oGetOpt . addActiveOption(OPT_CRYPTOANALYZE, " — analyze") ; 

oGetOpt .addActiveOption(OPT_STATS, "-stats") ; 

} 

Debug. debug ("Final set of options: " + oGetOpt) ; 

if (oGetOpt . isActiveOption(OPT_CRYPTOANALYZE) ) 
{ 

ICryptoAnalyzer oCryptanalysis = null; 

// Pick a heuristic strategy 

if (oGetOpt . isActiveOption(OPT_ANNEAL) ) 

{ 

oCryptanalysis = new SimulatedAnnealingO ; 

} 

else if (oGetOpt . isActiveOption(OPT_TABU) ) 
{ 

oCryptanalysis = new TabuSearchQ ; 

} 

else if (oGetOpt . isActiveOption(OPT_GENETIC) ) 
{ 

oCryptanalysis = new GeneticAlgorithmQ ; 

} 

else 
{ 

oCryptanalysis = new StatisticalO ; 

} 

// Parse 

if (oGetOpt . isActiveOption(OPT_PARSE) ) 
{ 

Debug. debugC'Parsing text. . . ") ; 

//String strOutfile = oCryptanalysis . getClass (). getName () + ".parsed.txt" 
//FileOutputStream oFOS = new FileOutputStream(strOutf ile) ; 
/ /oCryptanalysis . analyze (olnputText , oFOS) ; 
//oFOS.closeO ; 

//FilelnputStream oFIS = new FileInputStream(strOutf ile) ; 
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FilelnputStream oFIS = new FilelnputStream (oGetOpt . getOption(OPT_FILENAME) ) ; 
CryptoParser oParser = new DictionaryBasedParser (oFIS) ; 
oParser .parseO ; 

} 

else 
{ 

Debug. debug ("Analyzing text. . ."); 
// Re-load stored trained stats. 

LanguageStatistics oStats = new LanguageStatistics () ; 
oStats. setDumpMode(IStorageManager.DUMP_GZIP_BINARY) ; 

oStats . setFilename(oStats . getClassO . getName () + "." + oStats . getDef aultExtensionO ) ; 
oStats. restore () ; 

//Debug. debug(oStats) ; 

oCrypt analysis . setStatistics (oStats) ; 

oCrypt analysis . analyze (olnputText , System. out) ; 

} 

} 



/* 

* Encryption 
*/ 

else if (oGetOpt . isActiveOption(OPT_ENCRYPT) ) 
{ 

ICipher oCipher = null; 

if (oGetOpt . isActiveOption(OPT_SPN_CIPHER) ) 
{ 

oCipher = new SPNQ ; 

} 

else if (oGetOpt . isActiveOption(OPT_SUBSTITUTION_CIPHER) ) 
{ 

oCipher = new SubstitutionO ; 
//prepare key 

String strPermutationKey = oGetOpt .getOption(OPT_KEY) ; 
char[] acKey = StrPermutationKey .toCharArray () ; 
char[] acOriginal = new char [acKey. length] ; 

//Assumes English characters only for now 
int iBase = 1 A' ; 

for(int i = 0; i < acKey . length; i++) 
{ 

acOriginal [i] = (char) (iBase + i) ; 

} 

((Substitution) oCipher) . setPermutation(acOriginal , acKey) ; 

} 

else 
{ 

oCipher = new Shift (); 

} 



Cryptolysis - v. 0.0.1 Report and Manual 



//System. out .print In ("Encrypting text ..."); 
oCipher . encrypt (olnputText , oOutputText) ; 
//System. out. printlnO'Encryption done. ") ; 

} 

else if (oGetOpt . isActiveOption(OPT_DECRYPT) ) 
{ 

ICipher oCipher = null; 

if (oGetOpt . isActiveOption(OPT_SPN_CIPHER) ) 
{ 

oCipher = new SPNQ ; 

} 

else if (oGetOpt . isActiveOption(OPT_SUBSTITUTION_CIPHER) ) 
{ 

oCipher = new SubstitutionO ; 

} 

else 
{ 

oCipher = new Shift (); 

} 

oCipher .decrypt (olnputText, oOutputText) ; 

} 

/* 

* Training 
*/ 

else if (oGetOpt . isActiveOption(OPT_TRAIN) ) 
{ 

Debug. debug( "Training. . . ") ; 

if (oGetOpt . isActiveOption(OPT_LANG_STATS) ) 
{ 

Debug. debugC'Training language statistics."); 

StatisticsBuilder oBuilder = new StatisticsBuilder () ; 
LanguageStatistics oStats = oBuilder . createStatistics () ; 

oBuilder . addToStatistics (oStats , olnputText) ; 

// Serialize stats data to be reloaded 

//in the future when running an attack. 

// It is dumped gzip compressed with the name of the 

// class as a filename with a .gzbin extension. 

oStats. setDumpMode(IStorageManager.DUMP_GZIP_BINARY) ; 

oStats. setFilename (oStats . getClassQ .getNameO + "." + oStats . getDef aultExtensionO ) ; 
oStats. dumpO ; 

// Prints out a text prepresentation of the language statistics object 
System. out .println(oStats) ; 

} 

else if (oGetOpt . isActiveOption(OPT_DICT_PARSER) ) 
{ 

Debug. debugC'Training dictionary-based parser."); 

DictionaryBasedParser oParser = new DictionaryBasedParser (olnputText) ; 
oParser . train ; 
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else if (oGetOpt . isActiveOption(OPT_NGRAM_PARSER) ) 
{ 

Debug. debugC'Training N-gram-based parser."); 

CharacterNgramBasedParser oParser = new CharacterNgramBasedParser (olnputText) ; 
oParser .trainO ; 

} 

else 
{ 

throw new ExceptionC'No valid training module type found."); 

} 

Debug. debugC'Training done."); 

} 

/* 

* Check matchedness 
*/ 

else if (oGetOpt . i s Act i veOpt ion ( OPT_CHECK_MATCHEDNESS ) ) { 

MatchednessChecker oChecker = new MatchednessCheckerO ; 

double result = oChecker . computeMatchedness (olnputText , olnputTextCompare) ; 
Sy st em. out. print In ( oGetOpt .getOption(OPT_FILENAME) + " vs " + 

oGetOpt .getOption(OPT_TARGET_FILE) + "\n\tmatching at : "+ 

result*100 + " '/,") ; 

System. out .println("Statistics for each monogram: ") ; 
char[] monograms = oChecker. getTrackedMonograms () ; 
for (int i = 0; i < monograms . length; i++) 
{ 

System. out. println("\t" + monograms [i] +" : " + 

oChecker . getMonogramMatchednessStatistics (monograms [i] ) * 100 + "'/," ); 

} 

} 

} 

catch(Exception e) 
{ 

System . err . pr intln (e . getMessage ( ) ) ; 
e .printStackTrace (System. err) ; 
System. exit(l) ; 

} 

} 

public static final void usage () 
{ 

System. out .println 

( 

"Cryptolysis $Revision: 1.19 $\n" 

+ "A Crytanalysis Framework for Classical Ciphers\n" 
+ "Author: CIISE Security Investigation Iniative\n\n" 

+ "Usage :\n\n" 

+ " java Cryptolysis — help I -h\n" 

+ " displays usage inf ormation\n\n" 



+ " java Cryptolysis — version\n" 
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java Cryptolysis [ OPTIONS ] [ —debug ] \n" 

does cryptoanalysis and cryptography-related tasks\n\n" 

Where options are one of the following: \n\n" 

—analyze [ —parse ] [ ATTACK ] [ FILENAME ] \n" 
—encrypt [ CIPHER ] [ FILENAME KEY TARGETFILE ] \n" 
—decrypt [ CIPHER ] [ FILENAME KEY TARGETFILE ] \n" 
—train STATSMODULE [ FILENAME ]\n" 
— checkmatch FILENAME FILENAME \n\n" 

Where ATTACK is one or more of the following: \n\n" 
-stats Statistical N-gram Model Attack\n" 
-genetic Genetic Algorithm Attack\n" 
-anneal Simulated Annealing Attack\n" 
-tabu Tabu Search Attack\n\n" 

Where CIPHER is one or more of the following: \n\n" 
-shift Shift Cipher\n" 
-subst Substitution Cipher\n" 
-spn SPN Cipher\n\n" 

Where STATSMODULE is one of the following: \n\n" 
-diet Dictionary-based\n" 
-ngram Character N-gram based\n" 
-lang Language N-gram statistics\n" 



); 



// EOF 
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